Non-audible murmur recognition based on fusion of audio and visual streams

نویسندگان

Panikos Heracleous

Norihiro Hagita

چکیده

Non-Audible Murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with the use of special acoustic sensors (i.e., NAM microphones) attached behind the talker’s ear. In a NAM microphone, body transmission and loss of lip radiation act as a low-pass filter. Consequently, higher frequency components are attenuated in a NAM signal. Owing to such factors as spectral reduction, the unvoiced nature of NAM, and the type of articulation, the NAM sounds become similar, thereby causing a larger number of confusions in comparison to normal speech. In the present article, the visual information extracted from the talker’s facial movements is fused with NAM speech using three fusion methods, and phoneme classification experiments are conducted. The experimental results reveal a significant improvement when both fused NAM speech and facial information are used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition

In this paper, we report non-audible murmur (NAM) recognition results in noisy environments and investigate the effect of the Lombard reflex on non-audible murmur recognition. Non-Audible murmur is speech uttered very quietly and captured through body tissue by a special acoustic sensor (e.g., NAMmicrophone). A system based on non-audible murmur recognition can be applied in cases when privacy ...

متن کامل

Applications of Nammicrophone for Privacy in Human-machi

In this paper, we present the use of stethoscope and silicon NAM microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker’s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when ...

متن کامل

Two-Level Bimodal Association for Audio-Visual Speech Recognition

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second...

متن کامل

Multi-level Fusion of Audio and Visual Features for Speaker Identification

This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the intercorrelations and loose timing synchroni...

متن کامل

Joint processing of audio and visual information for multimedia indexing and human-computer interaction

Information fusion in the context of combining multiple streams of data e.g., audio streams and video streams corresponding to the same perceptual process is considered in a somewhat generalized setting. Speci cally, we consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of descriptors e.g., speech recognition/transcription,...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Non-audible murmur recognition based on fusion of audio and visual streams

نویسندگان

چکیده

منابع مشابه

Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition

Applications of Nammicrophone for Privacy in Human-machi

Two-Level Bimodal Association for Audio-Visual Speech Recognition

Multi-level Fusion of Audio and Visual Features for Speaker Identification

Joint processing of audio and visual information for multimedia indexing and human-computer interaction

عنوان ژورنال:

اشتراک گذاری